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1. Effective size of support 

Let X be a discrete random variable which can take on values from a 
finite set X of m elements, with probabilities specified by the probability 
mass function (pmf) p. The support of X is a set S(p(X)) = {p : pi > 0, % = 
1,2,..., m}. Let \S(p(X))\ denote the size of the support. 

While pmf p = [0.5, 0.5] makes both outcomes equally likely, the following 
pmf q = [0.999, 0.001] characterizes a random variable that can take on 
almost exclusively only one of two values. However, both p and q have the 
same size of support. This motivates a need for a quantity that could measure 
size of support of the random variable in a different way, so that the random 
variable can be according to its pmf placed in the range [l,J7i]. We will call 
the new quantity /measure the effective support size (Ess), and denote it by 
§(p(X)), or S(p), for short. The example makes it obvious that §(•) should 
be such that E>(q) will be close to 1, while to p it should assign value S(p) = 2. 

2. Properties of Ess 

Ess should have certain properties, dictated by common sense. 

PI) §(p) should be continuous, symmetric function. 

P2) §(<5 m ) = 1 < §Qo m ) < §>(w m ) = m; where u m denotes uniform pmf on 
m-element support, 5 m denotes an m-element pmf with probability concen- 
trated at one point, p m denotes a pmf with \S(p)\ = m. 

P3) §([p w ,0]) =S(p m ). 

P4) S(p(X,Y)) = S(p(X))§(p(Y)), if X and Y are independent random 
variables. 

The first two properties are obvious. The third one states that extending 
support by an impossible outcome should leave Ess unchanged. Only the 
fourth property needs, perhaps, some little discussion. Or, better, an exam- 
ple. Let p(X) = [1, 1, l]/3 and p(Y) = [1,1] /2 and let X be independent 
of Y. Then p(X, Y) = [1, 1, 1, 1, 1, l]/6. According to P2), §(p(X)) = 3, 
S(p(Y)) = 2 and §(p(X,Y)) = 6 = S(p{X))§(p(Y)) . It is reasonable to 
require the product relationship to hold for independent random variables 
with arbitrary distributions. 

The properties Pl)-P4) are satisfied by S(p, a) = (Y^iLiP?) 1 ^ > where a 
is positive real number, different than 1. Note that §(•) of this form is exp 
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Table 1: S(p, a) for a = 0.001, 0.1, 0.5, 0.9, 1.0, 1.5, 2.0, 10, oo and different 
p's. 



a 


[0.5,0.5] 


[0.6,0.4] 


S(p,a) 
[0.7,0.3] 


[0.8,0.2] 


[0.9,0.1] 


[1.0,0.0] 


0.001 


2.000000 


1.999959 


1.999826 


1.999554 


1.998979 


1.000000 


0.1 


2.000000 


1.995925 


1.982696 


1.956233 


1.902332 


1.000000 


0.5 


2.000000 


1.979796 


1.916515 


1.800000 


1.600000 


1.000000 


0.9 


2.000000 


1.964013 


1.856116 


1.675654 


1.416403 


1.000000 


1.0 


2.000000 


1.960132 


1.842023 


1.649385 


1.384145 


1.000000 


1.5 


2.000000 


1.941178 


1.777878 


1.543210 


1.275510 


1.000000 


2.0 


2.000000 


1.923077 


1.724138 


1.470588 


1.219512 


1.000000 


10.0 


2.000000 


1.760634 


1.486289 


1.281379 


1.124195 


1.000000 


oo 


2.000000 


1.666666 


1.428571 


1.250000 


1.111111 


1.000000 



of Renyi's entropy. For a — > 1, S(p, a) also satisfies Pl)-P4) and takes the 
form of exp(H(p)), where H(p) = — YliLi Pi l°gP« is Shannon's entropy. It 
is thus reasonable to define S(p, a) for a — 1 this way (with the convention 
OlogO = 0), so that §(•) then becomes a continuous function of a. 

3. Selecting a 

The requirements define entire class of measures of effective support size. 
This opens a problem of selecting a. In Table 1, E>(p, a) is given for various 
two-element pmf's, and a = 0.001, 0.1, 0.5, 0.9, 1.0, 1.5, 2.0, 10, oo. The value 
§(p, a — > oo) can be found analytically. 

From the table it can be seen that the smaller the a, the more §(•,«) 
ignores the actual difference between probabilities. For p = [0.9,0.1] the 
difference is 0.8, yet S(p, 0.001) = 1.998979, i.e., it interprets the pmf as 
being very close to [0.5,0.5]. 

Based on the table, we would opt for S(-,a — > oo) as the good measure 
of Ess. However, for larger |<S| this choice becomes less attractive. This can 
be seen easily from a consideration of continuous random variables. 

4. Selecting a: continuous case 

In the case of continuous random variable S(f(x),a) = (f /"(a^dx) 1 ^. 
For gaussian n(fj,,a 2 ) distribution, §(-,a) = ; cf. [3]. This for a — > oo 

converges to so that for a 2 = 1 it becomes V2rr = 2.5067. It is worth 

comparing with §(•,«= 1) = V2 ena 2 (cf. [1]), which reduces in the case of 
a 2 = 1 to 4.1327. This makes much more sense. 
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That S(-,a — > oo) is not the appropriate measure of Ess can be even 
more clearly seen in the case of the Exponential distribution. For fie~P x with 
(3 = 1, §(•, a -> oo) = 1 while a = 1) = e. 

5. Adding another property 

The above considerations suggest that S(-,a = 1) might be the most 
appropriate of the Ess measures which satisfy the requirements Pl)-P4). 
The question is whether there is some other requirement that is reasonable 
to add to the already employed properties, such that it could narrow down 
the set of feasible §(•, a) to §(•, 1). 

To this end, let us consider two random variables X, Y that are de- 
pendent. Let p(Y\X) be the conditional distribution and p(X,Y) the joint 
distribution. For any of them its Ess can be obtained by §(•, a). For instance, 
let X can take on two values X\,X2- Then, E>(p(Y\X = xi),a) is Ess of the 
conditional distribution of Y given that X has taken the value x\. 

In analogy with P4) it seems reasonable to define Ess for a mean of the 
conditional distributions §(p(Y\X),a) as S(p(Y\X),a) = S %^ff - Note 
that S(p(Y\X), a) is the same regardless of what value the conditioning vari- 
able X has taken. This is why it is a kind of Ess for a mean of the conditional 
distributions. Note also that when X and Y are independent the definition 
reduces to the requirement P4). 

Now, once the new object is defined, one might wonder whether it can be 
related to Ess's of the conditional distributions. For a = 1 such a relationship 
indeed exists: 

n 

S(p(Y\X), 1) = J] S(p(Y\X = x t ), l)P(*=*«>. (i) 
i=i 

If Eq. (1) was turned into the fifth requirement, then by invoking Khinchin's 
[2] uniqueness theorem (which characterizes Shannon's entropy), it can be 
claimed that §(•, 1) is the only Ess which satisfies the enhanced set of re- 
quirements. 

It should be added, however, that Eq. (1) is not the only perceivable rela- 
tionship between S(p(Y\X) and Ess's for conditional distributions. Instead 
of the form of weighted geometric mean the relationship could for instance 
take the form of weighted arithmetic mean. Whether in this case there is 
some a which could satisfy the relationship remains to be an open problem 
(at least for the present author). 

6. Summary 

In this speculation we entertained the newly-introduced concept of effec- 
tive support size (Ess). There are some obvious requirements Pl)-P4) that 
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Ess has to satisfy. The class of Ess measures S(-.a) = (X^i P?) 1 ^" which 
satisfies the requirements is broad. The Ess measures are in a direct rela- 
tionship to the family of Renyi's entropies which includes as its special case 
also Shannon's entropy. We have briefly addressed the issue of selecting a 
such that the corresponding §(•, a) would be the most 'appropriate' measure 
of Ess. The considerations indicate that a — 1 could, perhaps, be the most 
reasonable candidate. If Eq. (1) was added into the set of requirements, then 
§(-, 1) would become the only §(•) that satisfies them. However, there are 
also other conceivable relationships between S(p(Y\X) and the conditional 
§(•). Whether some of them could be satisfied by S(-,o;) for some other a 
remains to be an open question. In any case, with the concept of Ess it 
is possible to enter a meaningful world which is in a sense dual to that of 
entropies. 
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